Supervised Spoken Document Summarization jointly Considering Utterance Importance and Redundancy by Structured Support Vector Machine

نویسندگان

  • Hung-yi Lee
  • Yu-Yu Chou
  • Yow-Bang Wang
  • Lin-Shan Lee
چکیده

In extractive spoken document summarization, it is desired to select important utterances from documents to construct the summary while avoiding redundancy among the selected utterances, but it is not easy to balance the two different goals. In this paper, a supervised spoken document summarization approach is proposed based on structured support vector machine (SVM), in which the above two goals are jointly considered during training. A set of parameters not only describing the ways to evaluate the importance of the utterances but minimizing the redundancy is directly learned from the training set. Encouraging results were obtained on a lecture corpus in the preliminary experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supervised spoken document summarization based on structured support vector machine with utterance clusters as hidden variables

This paper presents a supervised approach for extractive summarization of spoken document considering utterance clusters in the documents as hidden variables. Utterances in important clusters may be jointly included in the summary, while those in less important clusters may be excluded as a whole. The summaries are therefore selected based on not only the conventional principle of including the...

متن کامل

Automatic Segmentation and Summarization of Spoken Lectures

The ever-increasing number of online lectures has created an unprecedented opportunity for distance learning. Most online lectures are presented as unstructured text, audio and/or video files which make it di cult for students to locate relevant lectures and browse through them. In this thesis, we investigated several automatic lecture segmentation and summarization algorithms. Automatic lectur...

متن کامل

Extractive Multi-Document Summarization with Integer Linear Programming and Support Vector Regression

We present a new method to generate extractive multi-document summaries. The method uses Integer Linear Programming to jointly maximize the importance of the sentences it includes in the summary and their diversity, without exceeding a maximum allowed summary length. To obtain an importance score for each sentence, it uses a Support Vector Regression model trained on human-authored summaries, w...

متن کامل

Alignment of spoken utterances with slide content for easier learning with recorded lectures using structured support vector machine (SVM)

This paper reports the first known effort to automatically align the spoken utterances in recorded lectures with the content of the slides used. Such technologies will be very useful in Massive Open On-line Courses (MOOCs) and various recorded lectures as well as many other applications. We propose a set of approaches considering the problem that words helpful for such alignment are sparse and ...

متن کامل

Concept-Map-Based Multi-Document Summarization using Concept Coreference Resolution and Global Importance Optimization

Concept-map-based multi-document summarization is a variant of traditional summarization that produces structured summaries in the form of concept maps. In this work, we propose a new model1 for the task that addresses several issues in previous methods. It learns to identify and merge coreferent concepts to reduce redundancy, determines their importance with a strong supervised model and finds...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012